Structurally Informed Mutagenesis of a Stereochemically Promiscuous Aldolase Produces Mutants That Catalyze the Diastereoselective Syntheses of All Four Stereoisomers of 3-Deoxy-hexulosonic Acid

A 2-keto-3-deoxygluconate aldolase from the hyperthermophile Sulfolobus solfataricus catalyzes the nonstereoselective aldol reaction of pyruvate and d-glyceraldehyde to produce 2-keto-3-deoxygluconate (d-KDGlc) and 2-keto-3-deoxy-d-galactonate (d-KDGal). Previous investigations into curing the stereochemical promiscuity of this hyperstable aldolase used high-resolution structures of the aldolase bound to d-KDGlc or d-KDGal to identify critical amino acids involved in substrate binding for mutation. This structure-guided approach enabled mutant variants to be created that could stereoselectively catalyze the aldol reaction of pyruvate and natural d-glyceraldehyde to selectively afford d-KDGlc or d-KDGal. Here we describe the creation of two further mutants of this Sulfolobus aldolase that can be used to catalyze aldol reactions between pyruvate and non-natural l-glyceraldehyde to enable the diastereoselective synthesis of l-KDGlc and l-KDGal. High-resolution crystal structures of all four variant aldolases have been determined (both unliganded and liganded), including Variant 1 with d-KDGlc, Variant 2 with pyruvate, Variant 3 with l-KDGlc, and Variant 4 with l-KDGal. These structures have enabled us to rationalize the observed changes in diastereoselectivities in these variant-catalyzed aldol reactions at a molecular level. Interestingly, the active site of Variant 4 was found to be sufficiently flexible to enable catalytically important amino acids to be replaced while still retaining sufficient enzymic activity to enable production of l-KDGal.


Preparation of protein and substrate coordinates
Molecular dynamics (MD) simulations were performed to model the state in which there is a pyruvate-lysine Schiff base (KPI) together with a (non-covalently bound) glyceraldehyde (D-or L-). In total, 12 systems of SsKDG-aldolase variants were prepared, based on X-ray data as outlined in Table S2 (see Table S1 for crystal structure details). WT SsKDG-aldolase and variants 1 and 2 were prepared in complex with Dglyceraldehyde, either in position to form D-KDGlc or to form D-KDGal. WT SsKDGaldolase and variants 3 and 4 were prepared in complex with L-glyceraldehyde, either in position to form L-KDGlc or to form L-KDGal. Coordinates of variants with high resolution crystal structures were selected and assembled with appropriate substrate coordinates using PyMOL (The PyMOL Molecular Graphics System, Version 2.3.0, Schrödinger, LLC.). The simulation setup of 12 coordinates is shown in Table S2 which indicates coordinate sources of SsKDG-aldolase and substrates. Note that two crystal structures of Variant 3 are low in resolution (Table S1); the setup of Variant 3 is therefore based on the Variant 2 structures, because they differ by only a single mutation (Q/D181).

Molecular dynamics setup and simulations
The glyceraldehyde was only modelled in chain A of the tetramer in all systems. Water molecules from SsKDG-aldolase structures were kept. The Enlighten2 PREP protocol (see: https://github.com/vanderkamp/enlighten2) was performed to generate Amber topology and coordinate files for further MD simulations. 2 This involves addition of hydrogens to the protein using AmberTools18 programs pdb4amber and reduce. 3 Protonation states of titratable residues were predicted by PropKa 3.1. 4,5 Histidine tautomers (predicted by reduce) were manually edited to ensure consistency within variants (see Table S2). All ionizable residues were in their standard protonation states (Asp and Glu negatively charged, Lys and Arg positively charged). A 30 Å solvent sphere was added by the AmberTools18 3 program tleap, centered on the carbon atom of pyruvate which forms a carbon-carbon bond with D-or L-glyceraldehyde. The ff14SB force field was used to treat the protein and the TIP3P model was used for water. 6 AmberTools18 programs antechamber, prmchk2 and sqm are used for parameterization of the ligand with the General Amber Force Field version 2 (GAFF2). 2,7 Parametrization for the pyruvate-lysine Schiff base (KPI) was performed using the RED server (https://upjv.q4md-forcefieldtools.org/) 9 , with partial charges obtained from restrained electrostatic potential (RESP) fitting and missing force field parameters taken from analogous GAFF parameters. Parameter files for KPI are available as Supporting Information files (KPI.off and KPI.frcmod).
After preparation of the systems, the following steps were performed using the AmberTools program sander to relax the structures: 1) brief energy minimization of all hydrogens (with positional restraints of 50 kcal mol -1 Å -2 on all other atoms); 2) brief energy minimization of hydrogens within 22 Å of the solvent sphere center; 3) energy minimization of all atoms within 22 Å of the solvent sphere center.
After this minimization protocol, the carbon-carbon distance between glyceraldehyde and KPI was consistent with a 'near attack' or 'reaction ready' pose (between 2.8 and 3.4 Å).
MD simulations were based on the Enlighten2 dynamics protocol. 2 Prior to the 'production' MD simulations, a 5 ps heating process to 300 K and subsequently 50 ps dynamics were performed with carbon-carbon distance restraint (3.5 Å, force constant: 100 kcal mol −1 Å −2 ). Then, production simulations (100 ps, 1000 snapshots) were performed without distance restraint. During simulation, atoms within 26 Å of the solvent sphere center were free to move, SHAKE was applied to bonds containing hydrogens and a 2 fs timestep was employed. 10 independent sets of simulations (heating, 50 ps MD with carbon-carbon distance restraint, 100 ps without restraint) were run for all 12 variant systems, and the resulting trajectories analyzed with the AmberTools18 program CPPTRAJ. 8 The percentage of 'reactive' poses (for formation of KDGlc or KDGal) was calculated from the trajectory based on the following geometric criteria: (1) the expected stereoisomer (KDGlc/KDGal) formed was 'predicted' by the approach of the aldol-group to the pyruvate (KPI) carbon, expressed as a dihedral angle between the plane of the aldol group (carbon, oxygen and hydrogen) and the plane formed by the pyruvate carbon, aldol carbon and aldol oxygen ( Figure S6, Dglyceraldehyde as the starting structure). (2) the distance between carbon (pyruvate of KPI) and carbon (substrate) indicates if the carbon-carbon bond is ready to be formed, with a cut-off value of less than 4.0 Å (shorter cut-off distances give similar results). Figure S6. Dihedral angle used to indicate pre-S/pre-R poses. The dihedral angle is measured between the plane formed by the pyruvate carbon, aldol carbon and aldol oxygen and the plane of the aldol group (carbon, oxygen and hydrogen). Taking the Dglyceraldehyde as the starting structure, a dihedral angle between -180° and 0° favors pre-R (D-KDGal) geometry, whilst a dihedral angle between 0° and 180° has preference for pre-S (D-KDGlc) geometry. The positive and negative ranges of dihedral angles for the expected stereoisomer are the opposite for L-glyceraldehyde starting structure.

Molecular dynamics simulations results
The orientation of reactive poses from all MD simulations as calculated based on the above criteria is shown in Figure S7. Variants in Figure S7 (a) (WT, Variant 1 and Variant 2) take D-glyceraldehyde as the substrate, and can form D-KDGlc or D-KDGal. According to the glyceraldehyde complex simulations, these three SsKDG-aldolase variants show a clear preference for binding modes leading to D-KDGlc. This is not consistent with the preference indicated by the enzyme assay results (see Figure 1 Figure  S7 (b)), indicate a preference for binding modes leading to L-KDGal. This again is not consistent with the experimental stereoselectively (Figure 1 Table S2. V2 D-KDGlc is not shown due to its high preference for D-KDGlc conformation (high peak at pre-S, around 60°).

General experimental details
Reagents and solvents were obtained from commercial suppliers and used without further purification. Reactions were performed without air exclusion or drying, at room temperature, with magnetic stirring, unless otherwise stated. Anhydrous Na2SO4 was used as drying agents for organic solvents.
Capillary melting points were determined using a Stuart digital SMP10 melting point apparatus and are reported uncorrected to the nearest °C. Optical rotations were measured using an Optical Activity Ltd AA-10 Series Automatic Polarimeter, with a path length of 1 dm, and with concentration (c) quoted in g/100 mL.
Nuclear Magnetic Resonance (NMR) spectroscopy experiments were performed in deuterated solvent at 298 K (unless stated otherwise) on an Agilent ProPulse 500 MHz spectrometer. 1 H and 13 C chemical shifts (δ) are quoted in parts per million (ppm) and are referenced to either the residual solvent peak or tetramethylsilane (TMS) when possible. Coupling constants (J) are quoted in Hz.
Infrared (IR) spectra were recorded using a PerkinElmer Spectrum 100 FTIR spectrometer fitted with a Universal ATR FTIR accessory, with samples run neat and the most relevant, characteristic absorbances quoted as ν in cm -1 .
High resolution mass spectrometry (HRMS) results were acquired on an externally calibrated Agilent QTOF 6545 with Jetstream ESI. Molecular ions were detected in negative ionisation mode as deprotonated/desodiated species.

1,2-5,6-Di-O-isopropylidene-D-mannitol
Following the method of De Alvarenga et al.: 10 ZnCl2 (20.0 g, 146 mmol, 2.66 equiv.) was dissolved in acetone (anhydrous, 200 mL) under an N2 atmosphere at room temperature. D-Mannitol (10.0 g, 54.8 mmol, 1.00 equiv.) was added to this solution in one batch, and the reaction allowed to stir at room temperature until full dissolution (5 h). After this time, the reaction was cooled to 0 °C, a suspension of NaHCO3 (20 g) in water (20 mL) was added, and the mixture stirred vigorously for 15 min. The mixture was filtered to remove the precipitated zinc carbonate, which was washed with acetone (20 mL), and the crude product concentrated to dryness in vacuo. The crude product was dissolved in Et2O and water (25 mL each), separated, and the aqueous phase was extracted twice more with Et2O (2 x 25 mL). The combined organics were then washed with brine (50 mL), dried over Na2SO4, and concentrated to dryness in vacuo. The crude product was triturated with n-hexane to afford 1,2-5,6-di-O-isopropylidene-D-mannitol (7.085 g, 26.9 mmol) as a white solid in 49% yield. Characterization data were consistent with previous literature reports. 10,11 [α]D 23 = +7 (CHCl3, c = 1.0) (lit. 11  and MgSO4 (0.5 g) were added, and the mixture stirred for a further 15 min. The reaction was then filtered, and the solvent removed in vacuo at room temperature. The residue was purified by bulb-to-bulb distillation at atmospheric pressure under N2 to yield Dglyceraldehyde acetonide (0.931 g, 7.12 mmol) as a colourless oil in 89% yield. This intermediate could also be purified by column chromatography (SiO2, 30% EtOAc nhexane/EtOAc). Characterization data were consistent with previous literature reports. 12,13 This intermediate was used immediately due to its low stability to avoid degradation and polymerization. If stored at -18 °C for extended periods, redistillation is required in order to crack the reagent prior to use.

D-Glyceraldehyde.
D-Glyceraldehyde acetonide (1.0 equiv.) was added to a mixture of TFA/water (80 mM, pH 1-1.5, 20 mL/mmol of substrate). After stirring for 1.5 h at room temperature, the mixture was concentrated in vacuo to afford D-glyceraldehyde as an oily film in quantitative yield. This product was used immediately due to its low stability, to avoid degradation and polymerization. After this time, the reaction mixture was diluted with Et2O (35 mL), and 0.64 mL of water added dropwise over 5 min. The reaction was then stirred vigorously and warmed to 0 °C, and 1.0 mL of 4.0 M NaOH added. After a further 5 min of vigorous stirring, another 1.5 mL of water was added to the reaction, followed by warming to room temperature and stirring for 15 min. The reaction mixture was then dried using MgSO4, filtered, and solvents were removed in vacuo. The residue was purified by bulb-to-bulb distillation at atmospheric pressure under N2 to yield D-glyceraldehyde acetonide (1.640 g, 12.6 mmol) as a colourless oil in 68% yield. This intermediate could also be purified by column chromatography (SiO2, 30% EtOAc n-hexane/EtOAc). Characterization data were consistent with previous literature reports. 10,11 This intermediate was used immediately due to its low stability to avoid degradation and polymerization. If stored at -18 °C for extended periods, redistillation is required in order to crack the reagent prior to use.

L-Glyceraldehyde.
L-glyceraldehyde was prepared via acid catalysed hydrolysis L-glyceraldehyde acetonide as described previously for formation of D-glyceraldehyde (see section 4.2). This product was used immediately due to its low stability to avoid degradation and polymerization.

Synthesis and characterization of KDGal and KDGlc stereoisomers
All four KDGal/KDGlc stereoisomers were individually prepared in >95:5 dr using the appropriate variant aldolase to catalyse aldol reactions between D-or L-glyceraldehyde using the preparative/purification procedures described in the Methods and Materials section of the main article. Structural characterization of the KDGlc and KDGal products was hindered by the lack of high quality 1 H/ 13 C NMR spectroscopic characterization data reported for these compounds in the literature. The 1 H/ 13 C NMR spectra of KDGlc and KDGal are complicated by: (a) The presence of multiple interconverting open-chain, α-/β-pyranose and α-/β-furanose species in aqueous solution (shown for KDGlc in Scheme S1a); (b) The potential of the keto groups of KDGlc and KDGal to undergo keto-enol tautomerism (shown for KDGlc in Scheme 1b); (c) Hydration of the keto groups of KDGlc and KDGal to form gem-diols (shown for KDGlc in Scheme 1c). 16 The ratio of interconverting KDGal/KDGlc species present in aqueous solution are also dependent on concentration, pH and the presence of any impurities, which combine to complicate NMR analysis of mixtures of KDGlc/KDGal further. Consequently, NMR analysis was not used to determine dr values for KDGlc/KDGal formation in these aldolase catalyzed reactions, with aldol product ratios instead more accurately determined by HPLC analysis as reported previously (vide supra). 14,17